{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/00-getting-started.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/00-getting-started.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Started with Hugging Face Endpoints\n",
    "\n",
    "Hugging Face Endpoints allows access to straightforward model inference. Coupled with Pinecone we can generate and index high-quality vector embeddings with ease.\n",
    "\n",
    "Let's get started by initializing an Endpoint for generating vector embeddings.\n",
    "\n",
    "## Endpoints\n",
    "\n",
    "We start by heading over to the [Hugging Face Endpoints homepage](https://ui.endpoints.huggingface.co/endpoints) and signing up for an account if needed. After, we should find ourselves on this page:\n",
    "\n",
    "![endpoints 0](https://github.com/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/assets/hf-endpoints-0.png?raw=true)\n",
    "\n",
    "We click on **Create new endpoint**, choose a model repository (eg name of the model), endpoint name (this can be anything), and select a cloud environment. Before moving on it is *very important* that we set the **Task** to **Sentence Embeddings** (found within the *Advanced configuration* settings).\n",
    "\n",
    "![endpoints 1](https://github.com/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/assets/hf-endpoints-1.png?raw=true)\n",
    "\n",
    "![endpoints 2](https://github.com/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/assets/hf-endpoints-2.png?raw=true)\n",
    "\n",
    "Other important options include the *Instance Type*, by default this uses CPU which is cheaper but also slower. For faster processing we need a GPU instance. And finally, we set our privacy setting near the end of the page.\n",
    "\n",
    "After setting our options we can click **Create Endpoint** at the bottom of the page. This action should take use to the next page where we will see the current status of our endpoint.\n",
    "\n",
    "![endpoints 3](https://github.com/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/assets/hf-endpoints-3.png?raw=true)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the status has moved from **Building** to **Running** (this can take some time), we're ready to begin creating embeddings with it.\n",
    "\n",
    "## Creating Embeddings\n",
    "\n",
    "Each endpoint is given an **Endpoint URL**, it can be found on the endpoint **Overview** page. We need to assign this endpoint URL to the `endpoint_url` variable.\n",
    "\n",
    "![endpoints 4](https://github.com/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/assets/hf-endpoints-4.png?raw=true)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = \"<<ENDPOINT_URL>>\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will also need the organization API token, we find this via the organization settings on Hugging Face (`https://huggingface.co/organizations/<ORG_NAME>/settings/profile`). This is assigned to the `api_org` variable.\n",
    "\n",
    "![endpoints 5](https://github.com/pinecone-io/examples/blob/master/integrations/hugging-face/endpoints/assets/hf-endpoints-5.png?raw=true)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "api_org = \"<<API_ORG_TOKEN>>\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we're ready to create embeddings via Endpoints. Let's start with a toy example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "# add the api org token to the headers\n",
    "headers = {\n",
    "    'Authorization': f'Bearer {api_org}'\n",
    "}\n",
    "# we add sentences to embed like so\n",
    "json_data = {\"inputs\": [\"a happy dog\", \"a sad dog\"]}\n",
    "# make the request\n",
    "res = requests.post(\n",
    "    endpoint,\n",
    "    headers=headers,\n",
    "    json=json_data\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should see a `200` response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Response [200]>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inside the response we should find two embeddings..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(res.json()['embeddings'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also see the dimensionality of our embeddings like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "768"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dim = len(res.json()['embeddings'][0])\n",
    "dim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will need more than two items to search through, so let's download a larger dataset. For this we will use Hugging Face datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "Downloading builder script: 3.82kB [00:00, 1.67MB/s]                   \n",
      "Downloading metadata: 1.90kB [00:00, 1.08MB/s]                 \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading and preparing dataset snli/plain_text (download: 90.17 MiB, generated: 65.51 MiB, post-processed: Unknown size, total: 155.68 MiB) to /home/jupyter/.cache/huggingface/datasets/snli/plain_text/1.0.0/1f60b67533b65ae0275561ff7828aad5ee4282d0e6f844fd148d05d3c6ea251b...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Downloading: 100%|██████████| 1.93k/1.93k [00:00<00:00, 992kB/s]\n",
      "Downloading: 100%|██████████| 1.26M/1.26M [00:00<00:00, 31.2MB/s]\n",
      "Downloading: 100%|██████████| 65.9M/65.9M [00:01<00:00, 57.9MB/s]\n",
      "Downloading: 100%|██████████| 1.26M/1.26M [00:00<00:00, 43.6MB/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset snli downloaded and prepared to /home/jupyter/.cache/huggingface/datasets/snli/plain_text/1.0.0/1f60b67533b65ae0275561ff7828aad5ee4282d0e6f844fd148d05d3c6ea251b. Subsequent calls will reuse this data.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Dataset({\n",
       "    features: ['premise', 'hypothesis', 'label'],\n",
       "    num_rows: 550152\n",
       "})"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "snli = load_dataset(\"snli\", split='train')\n",
    "snli"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SNLI contains 550K sentence pairs, many of these include duplicate items so we will take just one set of these (the *hypothesis*) and deduplicate them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "480042"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "passages = list(set(snli['hypothesis']))\n",
    "len(passages)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will drop to 50K sentences so that the example is quick to run, if you have time, feel free to keep the full 480K."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "passages = passages[:50_000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Vector DB\n",
    "\n",
    "With our endpoint and dataset ready, all that we're missing is a vector database. For this, we need to initialize our connection to Pinecone, this requires a [free API key](https://app.pinecone.io/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from pinecone import Pinecone\n",
    "\n",
    "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
    "api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'\n",
    "\n",
    "# configure client\n",
    "pc = Pinecone(api_key=api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pinecone import ServerlessSpec\n",
    "\n",
    "cloud = os.environ.get('PINECONE_CLOUD') or 'aws'\n",
    "region = os.environ.get('PINECONE_REGION') or 'us-east-1'\n",
    "\n",
    "spec = ServerlessSpec(cloud=cloud, region=region)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we create a new index called `'hf-endpoints'`, the name isn't important *but* the `dimension` must align to our endpoint model output dimensionality (we found this in `dim` above) and the model metric (typically `cosine` is okay, but not for all models)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index_name = 'hf-endpoints'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "# check if index already exists (it shouldn't if this is first time)\n",
    "if index_name not in pc.list_indexes().names():\n",
    "    # if does not exist, create index\n",
    "    pc.create_index(\n",
    "        index_name,\n",
    "        dimension=dim,\n",
    "        metric='cosine',\n",
    "        spec=spec\n",
    "    )\n",
    "    # wait for index to be initialized\n",
    "    while not pc.describe_index(index_name).status['ready']:\n",
    "        time.sleep(1)\n",
    "\n",
    "# connect to index\n",
    "index = pc.Index(index_name)\n",
    "# view index stats\n",
    "index.describe_index_stats()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create and Index Embeddings\n",
    "\n",
    "Now we have all of our components ready; endpoints, dataset, and Pinecone. Let's go ahead and create our dataset embeddings and index them within Pinecone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 782/782 [11:02<00:00,  1.18it/s]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'dimension': 768,\n",
       " 'index_fullness': 0.1,\n",
       " 'namespaces': {'': {'vector_count': 50000}},\n",
       " 'total_vector_count': 50000}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from tqdm.auto import tqdm\n",
    "\n",
    "# we will use batches of 64\n",
    "batch_size = 64\n",
    "\n",
    "for i in tqdm(range(0, len(passages), batch_size)):\n",
    "    # find end of batch\n",
    "    i_end = min(i+batch_size, len(passages))\n",
    "    # extract batch\n",
    "    batch = passages[i:i_end]\n",
    "    # generate embeddings for batch via endpoints\n",
    "    res = requests.post(\n",
    "        endpoint,\n",
    "        headers=headers,\n",
    "        json={\"inputs\": batch}\n",
    "    )\n",
    "    emb = res.json()['embeddings']\n",
    "    # get metadata (just the original text)\n",
    "    meta = [{'text': text} for text in batch]\n",
    "    # create IDs\n",
    "    ids = [str(x) for x in range(i, i_end)]\n",
    "    # add all to upsert list\n",
    "    to_upsert = list(zip(ids, emb, meta))\n",
    "    # upsert/insert these records to pinecone\n",
    "    _ = index.upsert(vectors=to_upsert)\n",
    "\n",
    "# check that we have all vectors in index\n",
    "index.describe_index_stats()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With everything indexed we can begin querying. We will take a few examples from the *premise* column of the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Query: A person on a horse jumps over a broken down airplane.\n",
      "Answers:\n",
      "The horse jumps over a toy airplane.\n",
      "a lady rides a horse over a plane shaped obstacle\n",
      "A person getting onto a horse.\n",
      "person rides horse\n",
      "A woman riding a horse jumps over a bar.\n"
     ]
    }
   ],
   "source": [
    "query = snli['premise'][0]\n",
    "print(f\"Query: {query}\")\n",
    "# encode with HF endpoints\n",
    "res = requests.post(endpoint, headers=headers, json={\"inputs\": query})\n",
    "xq = res.json()['embeddings']\n",
    "# query and return top 5\n",
    "xc = index.query(vector=xq, top_k=5, include_metadata=True)\n",
    "# iterate through results and print text\n",
    "print(\"Answers:\")\n",
    "for match in xc['matches']:\n",
    "    print(match['metadata']['text'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These look good, let's try a couple more examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Query: A woman is walking across the street eating a banana, while a man is following with his briefcase.\n",
      "Answers:\n",
      "A woman eats a banana and walks across a street, and there is a man trailing behind her.\n",
      "A woman eats a banana split.\n",
      "A woman is carrying two small watermelons and a purse while walking down the street.\n",
      "The woman walked across the street.\n",
      "A woman walking on the street with a monkey on her back.\n"
     ]
    }
   ],
   "source": [
    "query = snli['premise'][100]\n",
    "print(f\"Query: {query}\")\n",
    "# encode with HF endpoints\n",
    "res = requests.post(endpoint, headers=headers, json={\"inputs\": query})\n",
    "xq = res.json()['embeddings']\n",
    "# query and return top 5\n",
    "xc = index.query(vector=xq, top_k=5, include_metadata=True)\n",
    "# iterate through results and print text\n",
    "print(\"Answers:\")\n",
    "for match in xc['matches']:\n",
    "    print(match['metadata']['text'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And one more..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Query: People on bicycles waiting at an intersection.\n",
      "Answers:\n",
      "A pair of people on bikes are waiting at a stoplight.\n",
      "Bike riders wait to cross the street.\n",
      "people on bicycles\n",
      "Group of bike riders stopped in the street.\n",
      "There are bicycles outside.\n"
     ]
    }
   ],
   "source": [
    "query = snli['premise'][200]\n",
    "print(f\"Query: {query}\")\n",
    "# encode with HF endpoints\n",
    "res = requests.post(endpoint, headers=headers, json={\"inputs\": query})\n",
    "xq = res.json()['embeddings']\n",
    "# query and return top 5\n",
    "xc = index.query(vector=xq, top_k=5, include_metadata=True)\n",
    "# iterate through results and print text\n",
    "print(\"Answers:\")\n",
    "for match in xc['matches']:\n",
    "    print(match['metadata']['text'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All of these results look excellent. If you are not planning on running your endpoint and vector DB beyond this tutorial, you can shut down both.\n",
    "\n",
    "**Once the index is deleted, you cannot use it again.**\n",
    "\n",
    "Shut down the endpoint by navigating to the endpoint **Overview** page and selecting **Delete endpoint**. Delete the Pinecone index with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "pc.delete_index(index_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "environment": {
   "kernel": "python3",
   "name": "common-cpu.m95",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/base-cpu:m95"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12 (main, Apr  5 2022, 01:52:34) \n[Clang 12.0.0 ]"
  },
  "vscode": {
   "interpreter": {
    "hash": "b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
